Efficient implementation of cascaded biquads

ABSTRACT

An improved biquad infinite impulse response filter is shown that may be implemented in a very large instruction word digital signal processor as well as in other processing circuitry. The new filter structure modifies the feedback path in the filter, resulting in a significant reduction in execution cycles.

TECHNICAL FIELD OF THE INVENTION

The technical field of this invention is digital signal processing, andmore particularly to infinite impulse response filters.

BACKGROUND OF THE INVENTION

One of the most-used digital filter forms is the biquad. A biquad is asecond order (two poles and two zeros) Infinite Impulse Response (IIR)filter. It is high enough order to be useful on its own, and because ofthe coefficient sensitivities in higher order filters the biquad isoften used as the basic building block for more complex filters. Forinstance, a biquad low pass filter has a cutoff slope of 12 dB/octave,useful for tone controls; if a 24 dB/octave filter is needed, you cancascade two biquads and it will have less coefficient sensitivityproblems than a single fourth-order design.

Biquads come in several forms. The most obvious, a direct implementationof the second order differential equation

(y[n]=a0*x[n]+a1*x[n−1]+a2*x[n−2]−b1*y[n−1]−b2*y[n−2]),

is called direct form 1 and is shown in FIG. 1.

Direct form 1 is the best choice for implementation in a fixed pointprocessor because it has a single summation point.

We can take direct form I and split it at the summation point as shownin FIG. 2, and then take the two halves and swap them, so that thefeedback half (the poles) comes first as shown in FIG. 3. Now one pairof z delays is redundant, storing the same information as the otherpair. Merging the two pairs yields the direct form II configurationshown in FIG. 4.

In floating point applications, direct form II is preferred because itreduces memory requirements, and floating point computation is notsensitive to overflow in the way fixed point computations are.

We can improve on this configuration by transposing the filter. Totranspose a filter, the signal flow direction is reversed. Outputbecomes input, distribution nodes become summers, and summers becomenodes as shown in FIG. 5. The characteristics of the filter areunchanged, but in this case the floating point characteristics arebetter. Floating point computation has better accuracy when intermediatesums are with closer values (adding small numbers to large number infloating point is less precise than with similar values).

SUMMARY OF THE INVENTION

An improved biquad filter is that is optimized for wide instruction worddigital signal processors. The feedback path of the filter is modified,resulting in significant performance improvements.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of this invention are illustrated in thedrawings, in which:

FIG. 1 shows a direct form 1 biquad filter;

FIGS. 2 and 3 show intermediate forms of the biquad;

FIG. 4 shows a direct form 2 biquad filter;

FIG. 5 is a transposed form 2 biquad;

FIG. 6 illustrates an implementation of a biquad filter on a DSP;

FIG. 7 shows a modified biquad implementation; and

FIG. 8 shows a comparison of prior art and implementation according tothis invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

FIG. 6 shows the transposed direct form II structure used in someimplementations in Texas Instruments Digital Signal Processors (DSP).This implementation requires more than 10 cycles in the feedback path.Three to 6 cycles are used in addition block 601, and 4 cycles inmultipliers 602 and 603. As shown in the figure, the feedback path tomultipliers 602 and 603 originates at the output 604.

FIG. 7 shows an improved implementation described in this invention. Thefeedback path to multipliers 702 and 703 originates from the output ofstorage element 701 instead of the output of summation block 704. Thecoefficient in multiplier 706 is changed from b1 to b1+a1, and thecoefficient in multiplier 707 is changed from b2 to b2+a2. Thisimprovement results in requiring 7 cycles in the overall feedback path,3 cycles in addition block 705 and 4 cycles in multipliers 702 and 703.

FIG. 8 further demonstrates the implementation of this invention. Thesignal flow in the prior art is shown in table 1, and Table 2 shows thesignal flow with the improved feedback path.

TABLE 1 out = in + d0 d0 = b1 * in + a1 * out + d1 d1 = b2 * in + a2 *out

TABLE 2 out = in + d0 t1 = (b1 + a1) * in + d1 t0 = a2 * d0 d0 = a1 *d0 + t1 d1 = (b2 + a2) * in + t0

Table 3 shows performance benchmarks of the improved biquad filterexecuting on Texas Instruments C674x and C66x digital signal processorsusing single precision 32-bit floating point arithmetic, and Table 4benchmarks filter performance using mixed/double precision floatingpoint arithmetic on the same digital signal processors.

TABLE 3 Exclusive Exclusive cycle count cycle count C674x per C66x perC66x/C674x C674x C66x Comments Comments Function biquad biquadImprovement bytes bytes C674xx C66x Cascade 4.5 4 1.11x 268 416 LoopLoop Biquad Carried Carried 1 Channel 2 Dependency Dependency stageBound 8, Bound 16, Resource Resource bound is 4 bound is 7 Loop Unroll2x Cascade 2.125 1.375 1.35x 1128 904 Loop Loop Biquad 2 Carried Carriedchannel 4- Dependency Dependency stage, same Bound 8, Bound 10,coefficient Resource Resource bound is 16 bound is 8 Cascade 2 1.331.34x 536 656 Loop Loop Biquad 2 Carried Carried channel 3- DependencyDependency stage, same Bound 10, Bound 8, coefficient Resource Resourcebound is 12 bound is 7

TABLE 4 Exclusive Exclusive cycle count cycle count Cascaded C674x perC66x per C66x/C674x Comments Comments Biquad biquad biquad ImprovementC674xx C66x 1 Channel 2 4.5 4 1.11x Loop Carried Loop Carried stageSingle Dependency Dependency Precision Bound 8, Bound 16, ResourceResource bound is 4 bound is 7 Loop Unroll 2x 1 Channel 2 9.75 4 2.4xLoop Carried Loop Carried stage, same Dependency Dependency coefficient,Bound 37, Bound 10, Mixed/Double Resource Resource Precision Bound is 32Bound is 10 Loop Unroll 2x 1 Channel 3 15.33 3.33 4.6x Loop Carried LoopCarried stage, same Dependency Dependency coefficient, Bound 20, Bound8, Mixed/Double Resource Resource Precision Bound is 24 Bound is 9 2Channel 2 15.25 3.5 4.36x Loop Carried Loop Carried stage, sameDependency Dependency coefficient, Bound 17, Bound 7, Mixed/DoubleResource Resource Precision Bound is 32 Bound is 14

What is claimed is:
 1. A method of performing infinite impulse responsefiltering, the method comprising the steps of: computing the filteroutput by setting out=in+d0 t1=(b1+a1)*in+d0 t0=a2*d0 d0=a1*d0d1=(b2+a2)*in+t0 where a1, a2, b1, b2 are coefficients and d0, d1, t0,t1 are intermediate results.
 2. The method of claim 1, wherein: theoutput is computed using a digital signal processor.
 3. The method ofclaim 1, wherein: the digital signal processor is a very longinstruction word type of digital signal processor.
 4. An apparatus forperforming infinite impulse response filtering, the apparatuscomprising: a digital signal processor operable to compute the filteroutput by performing the following steps: out=in+d0 t1=(b1+a1)*in+d0t0=a2*d0 d0=a1*d0 d1=(b2+a2)*in+t0 where a1, a2, b1, b2 are coefficientsand d0, d1, t0, t1 are intermediate results.
 5. The apparatus of claim4, wherein: the digital signal processor is a very long instruction wordtype of digital signal processor.